perf: optimize Hunyuan DiT Ulysses and non-attention paths by starrkk · Pull Request #1200 · ModelTC/LightX2V

starrkk · 2026-06-30T05:01:31Z

Summary

add optional Hunyuan DiT non-attention torch.compile wrappers controlled by model config
add split QKV input/output support for Ulysses attention through explicit config/API parameters
add optional async text gather and bounded text gather buffer reuse without inference-operator profiling hooks
wire shared QKV activation quantization into the Hunyuan transformer path through model config

Why

These changes reduce Python/tensor layout overhead around HunyuanVideo DiT inference and let Hygon DCU deployments reuse activation quantization for consecutive Q/K/V projections. Runtime choices are now passed through config/API parameters instead of environment-variable switches.

Validation

branch rebuilt on latest ModelTC/LightX2V:main (89dfa833)
ruff check --config=pyproject.toml passed for the touched files
ruff format --check --config=pyproject.toml passed for the touched files
python -m py_compile passed for the touched files
source check confirmed no LIGHTX2V_* env switches or profiler ranges remain in the touched operator files
validated as part of the HunyuanVideo1.5 I2V 8-card benchmark path on Hygon DCU

(cherry picked from commit 8f06fb6c7e0859f432a329a84f8d5d8e3a386ad1)

Support split image/text QKV inputs, optional split attention outputs, async text all_gather, and profiler ranges for Ulysses sequence-parallel attention. (cherry picked from commit 8bb7c3e1784140a8f6d372fe429b468e3a502b8b)

Reuse dynamic activation quantization across consecutive Q/K/V projections and route split image/text tensors through the Ulysses attention path when enabled. (cherry picked from commit 61c5df5c20106254d5294b910cdf3d1780970a97)

gemini-code-assist

Code Review

This pull request introduces several performance optimizations for Ulysses attention and Hunyuan Video transformer inference, including support for split QKV inputs/outputs to avoid copy overhead, asynchronous text gathering, buffer reuse, shared dynamic activation quantization, and optional torch.compile support for non-attention branches. The reviewer identified several critical issues: potential NCCL hangs due to overlapping collective operations on the same process group, high compilation overhead and cache thrashing from compiling functions with custom weight objects, an unbounded memory leak in the text gather buffer cache under dynamic prompt lengths, incorrect text mask length calculation when using split QKV inputs, and a potential AttributeError in the shared quantization check if key/value weights lack the expected quantization methods.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-30T05:04:54Z

+        if split_qkv_input:
+            img_qkv_len = img_q.shape[0]
+            txt_qkv_len = txt_q.shape[0]
+            txt_mask_len = None


When split_qkv_input is True, txt_mask_len is unconditionally set to None. However, if len(cu_seqlens_qkv) == 3, there is a text mask, and txt_mask_len should be computed as cu_seqlens_qkv[2] - slice_qkv_len to ensure correctness for other models or configurations using this generic Ulysses attention implementation.

Suggested change

if split_qkv_input:

img_qkv_len = img_q.shape[0]

txt_qkv_len = txt_q.shape[0]

txt_mask_len = None

if split_qkv_input:

img_qkv_len = img_q.shape[0]

txt_qkv_len = txt_q.shape[0]

txt_mask_len = cu_seqlens_qkv[2] - slice_qkv_len if len(cu_seqlens_qkv) == 3 else None

helloyongyang · 2026-07-01T05:03:06Z

Please do not include profiles in inference operators.

Please do not use environment variables as switches.

starrkk · 2026-07-01T10:01:17Z

Addressed in 59ddfb8.

Removed profiler ranges from the Ulysses attention/all-to-all inference operator paths.
Removed LIGHTX2V_* environment-variable switches from the touched operator files.
Runtime choices are now passed through model config / explicit API parameters instead of process environment variables.
Updated the PR description to reflect the new config/API-based control path.

zhenggf added 3 commits June 30, 2026 11:50

Optimize Hunyuan DiT non-attention compile

a0397e8

(cherry picked from commit 8f06fb6c7e0859f432a329a84f8d5d8e3a386ad1)

ulysses: add split qkv and async text gather

e7a3846

Support split image/text QKV inputs, optional split attention outputs, async text all_gather, and profiler ranges for Ulysses sequence-parallel attention. (cherry picked from commit 8bb7c3e1784140a8f6d372fe429b468e3a502b8b)

hunyuan: wire shared qkv quantization and split attention

0430c86

Reuse dynamic activation quantization across consecutive Q/K/V projections and route split image/text tensors through the Ulysses attention path when enabled. (cherry picked from commit 61c5df5c20106254d5294b910cdf3d1780970a97)

gemini-code-assist Bot reviewed Jun 30, 2026

View reviewed changes

fix: harden Hunyuan DiT optimization switches

b7d6fa6

starrkk marked this pull request as ready for review June 30, 2026 09:25

zhenggf and others added 2 commits July 1, 2026 14:59

style: format Hunyuan DiT optimization changes

b215bb3

refactor: remove env switches from Hunyuan DiT optimizations

59ddfb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize Hunyuan DiT Ulysses and non-attention paths#1200

perf: optimize Hunyuan DiT Ulysses and non-attention paths#1200
starrkk wants to merge 6 commits into
ModelTC:mainfrom
starrkk:codex/hunyuan-dit-ulysses-optimizations

starrkk commented Jun 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Uh oh!

Uh oh!

helloyongyang commented Jul 1, 2026

Uh oh!

starrkk commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

starrkk commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

helloyongyang commented Jul 1, 2026

Uh oh!

starrkk commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

starrkk commented Jun 30, 2026 •

edited

Loading